Final cleaning and linking of AKB IDs to Yambol Features

The script explores the consistency of datatypes and feature attributes created in previous scripts, streamlines loose ends, and marries the records to their AKB counterparts. Final data are clipped to Yambol bounadry before export in two spatially-deduplicated versions. This script can be run standalone and is final. There is no need to run previous scripts unless one wishes to explore the data in their raw state or change some of the processing.

What is AKB

AKB stands for the “Arheologicheska Karta na Bulgaria”, national digital register, where all archaeological sites and places of interest are entered upon their first investigation. The purpose of the present script is to add AKB identifiers assembled by Todor Valchev to the Yambol field observations. The biggest benefit of this exercise is that the AKB reference lets us see if a given mound was excavated and dated. It also allows users with access to AKB to look up the full cultural heritage information associated with the record.

Sites are often entered into AKB upon their first visit, with additional data added upon revisit, excavations, or other analyses.

The AKB link is one reason why we retain the Yambol data in two spatial versions: _early and _later. While the _later is based on revisits and thus more recent and authoritative (especially if driven by the need to update or improve the record), the _early record IDs were use to create links in AKB and so are more suitable for mapping of cultural heritage data in AKB with the environmental variables present here.

Setup

library(tidyverse)
library(sf)
library(mapview)

Load data - select early or late

Here we load aggregated and deduplicated features. Choose whether you wish to use the _later, most recent observations, or _early, the earliest version, which matches the AKB best. _early version is the default. If you change the default, remember to adjust the export filenames in code chunks in lines 242 and 272.

# features <- readRDS('../output_data/interim/features_dd_later.rds') # most
# recent observations
features <- readRDS("../output_data/interim/features_dd_early.rds")  # first AKB entry
str(features)
Classes 'sf' and 'data.frame':  1466 obs. of  24 variables:
 $ TopoID                 : num  2e+05 2e+05 2e+05 2e+05 2e+05 ...
 $ Note                   : chr  "Diadopetkova mogila" NA NA NA ...
 $ TRAP                   : num  9305 9411 9416 9417 9409 ...
 $ Source                 : chr  "Legacy verification" "Survey" "Legacy verification" "Legacy verification" ...
 $ Type                   : chr  "Burial Mound" "Burial Mound" "Burial Mound" "Burial Mound" ...
 $ LU_Around              : chr  "Scrub" "Scrub" "Annual agriculture" "Pasture" ...
 $ LU_Top                 : chr  "Scrub" "Scrub" "Annual agriculture" "Pasture" ...
 $ Date                   : Date, format: "2010-11-06" "2010-11-07" ...
 $ DiameterMax            : chr  "25" "15" "20" "15" ...
 $ HeightMax              : num  1.8 1.5 2 2 2 2 1 1 6 1 ...
 $ Condition              : chr  "3" "2" "3" "4" ...
 $ PrincipalSourceOfImpact: chr  NA "Nodata" "Agriculture" "Looting" ...
 $ AllNotes               : chr  "stones of the top" "" "part of the north  slope was ploughted" "S part was mainly ploughted, 50 % of the mound was destroyed" ...
 $ geometry               :sfc_POINT of length 1466; first list element:  'XY' num  441127 4710087
 $ distBG                 : Units: [m] num [1:1466, 1] 68354 61029 62183 62265 61453 ...
 $ distTown               : Units: [m] num  9731 1807 1304 1277 1487 ...
 $ distTownBoundary       : Units: [m] num  635 1807 1304 1277 1487 ...
 $ elevAster              : num  NA 242 188 186 215 ...
 $ slopeAster             : num  NA 3.528 3.089 3.12 0.549 ...
 $ aspectAster            : num  NA 133.3 57.2 53.9 53.7 ...
 $ TRI                    : num  NA 1.295 1.184 1.007 0.176 ...
 $ TPI                    : num  NA 5.71e-01 -2.69e-01 -9.54e-06 1.76e-01 ...
 $ rough                  : num  NA 4.609 3.849 4.005 0.801 ...
 $ prom250mbuff           : num  NA 7.17 43.48 12.04 11.15 ...
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:23] "TopoID" "Note" "TRAP" "Source" ...

Validation I: Check how many of the features are actually mounds

# Filter features by type
features %>%
    group_by(Type) %>%
    tally()
Simple feature collection with 9 features and 2 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 420515 ymin: 4647347 xmax: 499612.3 ymax: 4722257
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 9 × 3
  Type                      n                                           geometry
  <chr>                 <int>                                     <GEOMETRY [m]>
1 Burial Mound           1006 MULTIPOINT ((420515 4693580), (420707.1 4690851),…
2 Burial Mound?            91 MULTIPOINT ((422391.7 4698832), (429156.7 4695478…
3 Extinct Burial Mound    188 MULTIPOINT ((420832.5 4689836), (421875.6 4693594…
4 Extinct Burial Mound?    10 MULTIPOINT ((468291.5 4660440), (469551.2 4669012…
5 Other                    71 MULTIPOINT ((426968.4 4693384), (462171.2 4655463…
6 Surface Scatter          64 MULTIPOINT ((421045.4 4690265), (424207.4 4694969…
7 Tell                      3 MULTIPOINT ((465676.3 4675875), (467848.2 4662406…
8 Tell?                     1                           POINT (483400.7 4697154)
9 Uncertain Feature        32 MULTIPOINT ((437475.3 4690926), (445710.4 4699311…
# Verify the Source in the category 'Other' of Type.

features %>%
    filter(Type == "Other") %>%
    group_by(Source) %>%
    tally()
Simple feature collection with 2 features and 2 fields
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: 426968.4 ymin: 4647347 xmax: 496032.5 ymax: 4700030
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 2 × 3
  Source                  n                                             geometry
  <chr>               <int>                                     <MULTIPOINT [m]>
1 Legacy verification    30 ((426968.4 4693384), (462171.2 4655463), (462343 46…
2 Survey                 41 ((462452.2 4662370), (463837.3 4664726), (463869.9 …

In the tally of “Other” types, there are 34 Legacy verification features and 41 Survey features. While the latter are expected, the 34 verified features required follow up on 27 Dec 2022. Inspection showed that many of the verified features originate not from sunbursts but other map markers, such as rayed squares and triangles. We inspected these in the early seasons but stopped doing so after these turned out not to lead to mounds reliably.

What is hiding under “other”?

features %>%
    filter(Type == "Other" & Source == "Legacy verification") %>%
    group_by(PrincipalSourceOfImpact) %>%
    tally()
Simple feature collection with 13 features and 2 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 426968.4 ymin: 4647347 xmax: 496032.5 ymax: 4693384
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 13 × 3
   PrincipalSourceOfImpact                           n                  geometry
   <chr>                                         <int>            <GEOMETRY [m]>
 1 Agriculture                                       3 MULTIPOINT ((482322.1 46…
 2 Construction                                      5 MULTIPOINT ((467436.7 46…
 3 Erosion                                           2 MULTIPOINT ((462171.2 46…
 4 No observation                                    8 MULTIPOINT ((462907.7 46…
 5 Other                                             4 MULTIPOINT ((474493 4669…
 6 Other (Geodetic point )                           1  POINT (470009.1 4671929)
 7 Other (Military activity )                        1  POINT (471560.1 4669909)
 8 Other (Military construction and geodetic ma…     1  POINT (472794.2 4671237)
 9 Other (geodetic marker )                          1  POINT (469472.6 4669824)
10 Other (geodetic marker)                           1  POINT (463869.9 4659734)
11 Other (military activity and geodetic marker…     1    POINT (478912 4682704)
12 Post-depositional                                 1  POINT (426968.4 4693384)
13 Urban development                                 1  POINT (491526.2 4676393)

We can see that a lot of the map symbols led to geodetic points and benchmarks, 8 were not accessible, and other locations contained water wells and other structures, whose setting prevented us from labelling them as extinct mounds

Validation II: Check for attribute duplicates

Spatial duplication is addressed in 06_Enrich.Rmd

features$TRAP[duplicated(features$TRAP)]
numeric(0)

Validation III: Clean up condition, height and other attribute

Condition was documented on Likert scale from 1 to 5 with 1 being pristine and 5 being extinct. 0 denoted No observation.

unique(features$Condition)
[1] "3" "2" "4" "5" "1" "6" "0"
glimpse(features)
Rows: 1,466
Columns: 24
$ TopoID                  <dbl> 200013, 200016, 200018, 200019, 200020, 200022…
$ Note                    <chr> "Diadopetkova mogila", NA, NA, NA, NA, NA, NA,…
$ TRAP                    <dbl> 9305, 9411, 9416, 9417, 9409, 9314, 9315, 9415…
$ Source                  <chr> "Legacy verification", "Survey", "Legacy verif…
$ Type                    <chr> "Burial Mound", "Burial Mound", "Burial Mound"…
$ LU_Around               <chr> "Scrub", "Scrub", "Annual agriculture", "Pastu…
$ LU_Top                  <chr> "Scrub", "Scrub", "Annual agriculture", "Pastu…
$ Date                    <date> 2010-11-06, 2010-11-07, 2010-11-07, 2010-11-0…
$ DiameterMax             <chr> "25", "15", "20", "15", "30", "30", "20", "25"…
$ HeightMax               <dbl> 1.8, 1.5, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 6.0, 1…
$ Condition               <chr> "3", "2", "3", "4", "2", "5", "2", "5", "2", "…
$ PrincipalSourceOfImpact <chr> NA, "Nodata", "Agriculture", "Looting", "Agric…
$ AllNotes                <chr> "stones of the top", "", "part of the north  s…
$ geometry                <POINT [m]> POINT (441126.7 4710087), POINT (453727.…
$ distBG                  [m] <units[26 x 1]>
$ distTown                [m] 9730.7864 [m], 1806.6579 [m], 1303.8512 [m], 127…
$ distTownBoundary        [m] 635.3572 [m], 1806.6579 [m], 1303.8512 [m], 1276…
$ elevAster               <dbl> NA, 242.0523, 187.9270, 185.9483, 215.0000, 227.…
$ slopeAster              <dbl> NA, 3.527693e+00, 3.088841e+00, 3.120401e+00, …
$ aspectAster             <dbl> NA, 133.30900, 57.23311, 53.89560, 53.73102, 1…
$ TRI                     <dbl> NA, 1.29536438, 1.18449783, 1.00713158, 0.1759…
$ TPI                     <dbl> NA, 5.708199e-01, -2.686462e-01, -9.536743e-06…
$ rough                   <dbl> NA, 4.6093445, 3.8489990, 4.0048981, 0.8009949…
$ prom250mbuff            <dbl> NA, 7.17, 43.48, 12.04, 11.15, 67.27, 77.14, 1…
features <- features %>%
    dplyr::mutate(Condition = case_when(Condition == 0 ~ "NA", Condition == 6 ~ "5",
        Condition != 0 ~ Condition)) %>%
    dplyr::mutate(Condition = as.factor(Condition)) %>%
    dplyr::mutate(TypeCertainty = case_when(grepl("\\?", Type) ~ "Uncertain", !grepl("\\?",
        Type) ~ "Certain")) %>%
    dplyr::mutate(Type = gsub("\\?", "", Type)) %>%
    dplyr::mutate(DiameterMax = as.numeric(DiameterMax))

levels(features$Condition) = c(1, 2, 3, 4, 5, NA)

features %>%
    group_by(Type) %>%
    tally()
Simple feature collection with 6 features and 2 fields
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: 420515 ymin: 4647347 xmax: 499612.3 ymax: 4722257
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 6 × 3
  Type                     n                                            geometry
  <chr>                <int>                                    <MULTIPOINT [m]>
1 Burial Mound          1097 ((420515 4693580), (420707.1 4690851), (421010.1 4…
2 Extinct Burial Mound   198 ((420832.5 4689836), (421875.6 4693594), (422243.5…
3 Other                   71 ((426968.4 4693384), (462171.2 4655463), (462343 4…
4 Surface Scatter         64 ((421045.4 4690265), (424207.4 4694969), (427087.6…
5 Tell                     4 ((465676.3 4675875), (467848.2 4662406), (483395.3…
6 Uncertain Feature       32 ((437475.3 4690926), (445710.4 4699311), (457010.8…

Constrain to Yambol Province

# load the boundary
Y_region <- st_read("../input_data/Vectors/YamRegion.shp")
Reading layer `YamRegion' from data source 
  `C:\Users\Adela\Documents\RStudio\MoundMerging2023\input_data\Vectors\YamRegion.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 1 feature and 7 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 431400.8 ymin: 4641684 xmax: 504849.3 ymax: 4727299
Projected CRS: WGS 84 / UTM zone 35N
Y_features <- st_intersection(features, Y_region$geometry)
Y_features  # 1243 features and 24 fields
Simple feature collection with 1242 features and 24 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 433010.4 ymin: 4647347 xmax: 499612.3 ymax: 4722257
Projected CRS: WGS 84 / UTM zone 35N
First 10 features:
  TopoID Note TRAP              Source         Type          LU_Around
2 200016 <NA> 9411              Survey Burial Mound              Scrub
3 200018 <NA> 9416 Legacy verification Burial Mound Annual agriculture
4 200019 <NA> 9417 Legacy verification Burial Mound            Pasture
              LU_Top       Date DiameterMax HeightMax Condition
2              Scrub 2010-11-07          15       1.5         2
3 Annual agriculture 2010-11-07          20       2.0         3
4            Pasture 2010-11-07          15       2.0         4
  PrincipalSourceOfImpact
2                  Nodata
3             Agriculture
4                 Looting
                                                      AllNotes       distBG
2                                                              61029.50 [m]
3                       part of the north  slope was ploughted 62182.78 [m]
4 S part was mainly ploughted, 50 % of the mound was destroyed 62264.91 [m]
      distTown distTownBoundary elevAster slopeAster aspectAster      TRI
2 1806.658 [m]     1806.658 [m]  242.0523   3.527693   133.30900 1.295364
3 1303.851 [m]     1303.851 [m]  187.9270   3.088841    57.23311 1.184498
4 1276.735 [m]     1276.735 [m]  185.9483   3.120401    53.89560 1.007132
            TPI    rough prom250mbuff TypeCertainty                 geometry
2  5.708199e-01 4.609344         7.17       Certain POINT (453727.1 4706737)
3 -2.686462e-01 3.848999        43.48       Certain POINT (452999.1 4707739)
4 -9.536743e-06 4.004898        12.04       Certain   POINT (452997 4707823)
 [ reached 'max' / getOption("max.print") -- omitted 7 rows ]

Within Yambol, we documented 1243 features (early observations).

Execute attribute changes following manual review

This chunk corrects poor data entries in the structured digital forms on the basis of photographs, diaries, and AKB records.

# Type Certainty UNCERTAIN TO CERTAIN
certain <- Y_features %>% 
  filter(TRAP %in% c(9838,9852,8766,8359)) %>% 
  mutate(TypeCertainty = "Certain" )

# Type Burial mound to EXTINCT
extinct <- Y_features %>% 
  filter(TRAP %in% c(9516, 8337, 8388, 8770, 9910, 9911)) %>% 
  mutate(Type = "Extinct Burial Mound" )

# Type Burial mound to Scatter
scatter <- Y_features %>% 
  filter(TRAP == 9883) %>% 
  mutate(Type = "Surface Scatter" )

# Type Burial mound to Other
other <- Y_features %>% 
  filter(TRAP %in% c(8427, 9645))%>% 
  mutate(Type = "Other")

# Swap Source in these two and TypeCertainty
m8763 <- Y_features %>% 
  filter(TRAP == 8763) %>% 
  mutate(Source = "Legacy verification")
m8764 <- Y_features %>% 
  filter(TRAP ==  8764) %>% 
  mutate(Source = "Survey", Type = "Extinct Burial Mound", TypeCertainty = "Uncertain")
  
fixes <- rbind(m8763,m8764, other,scatter,extinct,certain)

class(fixes$distBG)
[1] "matrix" "array" 
# Fix source in features with row_update()
colnames(Y_features)
 [1] "TopoID"                  "Note"                   
 [3] "TRAP"                    "Source"                 
 [5] "Type"                    "LU_Around"              
 [7] "LU_Top"                  "Date"                   
 [9] "DiameterMax"             "HeightMax"              
[11] "Condition"               "PrincipalSourceOfImpact"
[13] "AllNotes"                "distBG"                 
[15] "distTown"                "distTownBoundary"       
[17] "elevAster"               "slopeAster"             
[19] "aspectAster"             "TRI"                    
[21] "TPI"                     "rough"                  
[23] "prom250mbuff"            "TypeCertainty"          
[25] "geometry"               
colnames(fixes)
 [1] "TopoID"                  "Note"                   
 [3] "TRAP"                    "Source"                 
 [5] "Type"                    "LU_Around"              
 [7] "LU_Top"                  "Date"                   
 [9] "DiameterMax"             "HeightMax"              
[11] "Condition"               "PrincipalSourceOfImpact"
[13] "AllNotes"                "distBG"                 
[15] "distTown"                "distTownBoundary"       
[17] "elevAster"               "slopeAster"             
[19] "aspectAster"             "TRI"                    
[21] "TPI"                     "rough"                  
[23] "prom250mbuff"            "TypeCertainty"          
[25] "geometry"               
# row_update() fails on the units distBG and geometry columns so we eliminate these for a moment to apply fixes, and then rejoin them.  

features_fixed <- rows_update(
  st_drop_geometry(Y_features)[,-14],  #remove distBG
  st_drop_geometry(fixes)[,-14],
  by = "TRAP")

# rejoin geometry and distBG to cleaned source and type data
features_fixed <- features_fixed %>% 
  left_join(Y_features[,c("TRAP", "distBG")], by = "TRAP") %>% 
  st_as_sf()

features_fixed %>% mapview(zcol = "Type")
features_fixed %>% 
  dplyr::filter(Type == "Tell") %>% 
  dplyr::select(Date, TRAP, Source)
Simple feature collection with 4 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 465676.3 ymin: 4662406 xmax: 483400.7 ymax: 4697154
Projected CRS: WGS 84 / UTM zone 35N
        Date TRAP              Source                 geometry
1 2018-09-20 9802              Survey POINT (483395.3 4665848)
2 2022-09-13 9825 Legacy verification POINT (467848.2 4662406)
3 2022-09-22 8777 Legacy verification POINT (483400.7 4697154)
4 2022-09-28 9914 Legacy verification POINT (465676.3 4675875)

Fix missing data notation

AllNotes field and TopoID field sometimes contain zeroes or are outright blank to indicate missing data. Missing data is legitimate, as only Map symbols get TopoID and many mounds were first encountered during Survey. Likewise, AllNotes field is aggregated from annotations in the mobile app, which served to comment on structured data (dropdown options), indicating some deviation or addition. They do not exist everywhere. Let’s convert all these zeroes and blanks to NAs for consistency.

features_fixed$AllNotes <- ifelse(features_fixed$AllNotes == "0" | features_fixed$AllNotes ==
    "", NA, features_fixed$AllNotes)
features_fixed$TopoID <- ifelse(features_fixed$TopoID == "0", NA, features_fixed$TopoID)

Clean up after the changes

rm(certain, extinct, fixes, m8763, m8764, other, scatter)

AKB identifiers

Here we add AKB numbers to feature data so their future analyses can be brought to bear on the environmental attributes documented during survey.

1055 AKB records

We have 1055 AKB numbers among 1466 visited features. The mismatch is because some features were not deemed worthy of registering (extinct or overbuilt status, military bunkers. etc)

AKB <- read_csv("../input_data/MoundsAKBnumbers.csv")
sum(!is.na(AKB$AKB))
[1] 1055
names(AKB)
[1] "TRAP"  "AKB"   "Area"  "Notes"
head(AKB)
# A tibble: 6 × 4
   TRAP     AKB Area     Notes                  
  <dbl>   <dbl> <chr>    <chr>                  
1  8048 2700001 Borisovo <NA>                   
2  9200 2700063 Yambol   <NA>                   
3  8356 2700064 Yambol   <NA>                   
4  8351 2700065 Yambol   second TRAP number 9227
5  9227 2700065 Yambol   second TRAP number 8351
6  8353 2700066 Yambol   second TRAP number 9257
# Which ones are excavated?
AKB %>%
    filter(grepl("[Ee]xcavated", Notes))
# A tibble: 29 × 4
    TRAP     AKB Area     Notes                        
   <dbl>   <dbl> <chr>    <chr>                        
 1  8047 2700149 Borisovo Excavated by Neli Tancheva   
 2  8043 2700151 Borisovo Excavated by Daniela Agre    
 3  8348 2700220 Mogila   Excavated by Piotr - Rome    
 4  8357 2700221 Mogila   Excavated by Volker EBA      
 5  8345 2700222 Mogila   Excavated by Ilia EBA        
 6  8349 2700226 Mogila   Excavated by Piotr - Rome    
 7  6009 2700287 Boyanovo Excavated by Ilia Iliev EBA  
 8  9357 2700289 Boyanovo Excavated by Daniela Agre EBA
 9  7004 2790017 Boyanovo Excavated by Ilia Iliev      
10  7005 2790018 Boyanovo Excavated by Daniela Agre    
# ℹ 19 more rows

Join AKB to features

Y_features <- features_fixed %>%
    left_join(AKB, by = c("TRAP")) %>%
    rename(AKBNotes = Notes)

mapview(Y_features, zcol = "AKBNotes")
colnames(Y_features)
 [1] "TopoID"                  "Note"                   
 [3] "TRAP"                    "Source"                 
 [5] "Type"                    "LU_Around"              
 [7] "LU_Top"                  "Date"                   
 [9] "DiameterMax"             "HeightMax"              
[11] "Condition"               "PrincipalSourceOfImpact"
[13] "AllNotes"                "distTown"               
[15] "distTownBoundary"        "elevAster"              
[17] "slopeAster"              "aspectAster"            
[19] "TRI"                     "TPI"                    
[21] "rough"                   "prom250mbuff"           
[23] "TypeCertainty"           "distBG"                 
[25] "AKB"                     "Area"                   
[27] "AKBNotes"                "geometry"               
head(Y_features[, 10:27])
Simple feature collection with 6 features and 18 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 452997 ymin: 4706447 xmax: 454596 ymax: 4707823
Projected CRS: WGS 84 / UTM zone 35N
  HeightMax Condition PrincipalSourceOfImpact
1       1.5         2                  Nodata
2       2.0         3             Agriculture
3       2.0         4                 Looting
                                                      AllNotes     distTown
1                                                         <NA> 1806.658 [m]
2                       part of the north  slope was ploughted 1303.851 [m]
3 S part was mainly ploughted, 50 % of the mound was destroyed 1276.735 [m]
  distTownBoundary elevAster slopeAster aspectAster      TRI           TPI
1     1806.658 [m]  242.0523   3.527693   133.30900 1.295364  5.708199e-01
2     1303.851 [m]  187.9270   3.088841    57.23311 1.184498 -2.686462e-01
3     1276.735 [m]  185.9483   3.120401    53.89560 1.007132 -9.536743e-06
     rough prom250mbuff TypeCertainty       distBG      AKB            Area
1 4.609344         7.17       Certain 61029.50 [m] 10001492        Drazhevo
2 3.848999        43.48       Certain 62182.78 [m] 10001486 Hadzhidimitrovo
3 4.004898        12.04       Certain 62264.91 [m] 10001487 Hadzhidimitrovo
  AKBNotes                 geometry
1     <NA> POINT (453727.1 4706737)
2     <NA> POINT (452999.1 4707739)
3     <NA>   POINT (452997 4707823)
 [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
# Are there duplicate AKB numbers?
AKBduplicated <- Y_features %>%
    filter(!is.na(AKB)) %>%
    filter(duplicated(AKB)) %>%
    pull(AKB)

Y_features %>%
    dplyr::filter(AKB %in% AKBduplicated) %>%
    dplyr::select(TRAP, AKB)
Simple feature collection with 16 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 464347.7 ymin: 4661331 xmax: 492400 ymax: 4706973
Projected CRS: WGS 84 / UTM zone 35N
First 10 features:
   TRAP      AKB                 geometry
1  9222  2700226 POINT (464359.3 4706914)
2  9225 10007352 POINT (464401.3 4706968)
3  9365 10001274 POINT (474347.1 4671482)
4  9223  2700220 POINT (464354.2 4706928)
5  9224 10007351 POINT (464404.7 4706951)
6  8034 10009627 POINT (467198.8 4680951)
7  8346 10007351 POINT (464400.3 4706954)
8  8347 10007352 POINT (464396.8 4706973)
9  8348  2700220 POINT (464347.7 4706939)
10 8349  2700226 POINT (464358.7 4706919)

Beware: In 9 instances [10007351 10007352 2700220 2700226 10007031 10009627 10009305 10001274 10009811], AKB numbers are duplicated, meaning that 16 TRAP mounds share an AKB number.

How many features/mounds have been excavated in Yambol by 2023

Y_features %>%
    filter(AKB > 0) %>%
    filter(grepl("Excav", AKBNotes))  #  27 excavated in early dataset
Simple feature collection with 27 features and 27 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 447232.5 ymin: 4662878 xmax: 493995.2 ymax: 4715727
Projected CRS: WGS 84 / UTM zone 35N
First 10 features:
  TopoID Note TRAP              Source         Type LU_Around  LU_Top
1 200029 <NA> 9431 Legacy verification Burial Mound   Pasture Pasture
2 200080 <NA> 9430 Legacy verification Burial Mound   Pasture Pasture
        Date DiameterMax HeightMax Condition PrincipalSourceOfImpact
1 2010-11-10          20         2         5                 Looting
2 2010-11-10          10        NA         5                 Looting
                                                         AllNotes      distTown
1                                           one big trench  10x10 1006.1010 [m]
2 one  big trench 9x6, totally destroyed, check excavation report  966.9691 [m]
  distTownBoundary elevAster slopeAster aspectAster      TRI       TPI    rough
1    1006.1010 [m]  144.0196   3.487972   353.33450 1.409735 0.7014713 4.790833
2     966.9691 [m]  147.0415   3.966855    28.08201 1.574440 0.6184120 5.340210
  prom250mbuff TypeCertainty       distBG      AKB     Area
1        83.09       Certain 64932.45 [m] 10001377 Drazhevo
2        29.14       Certain 64845.70 [m] 10001376 Drazhevo
                 AKBNotes                 geometry
1 Excavated by Ilia Iliev POINT (455348.3 4711125)
2 Excavated by Ilia Iliev   POINT (455447 4711057)
 [ reached 'max' / getOption("max.print") -- omitted 8 rows ]
Y_features %>%
    filter(AKB > 0)  # 1030/1040 early/later features have AKB >> duplicates must be here
Simple feature collection with 1030 features and 27 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 433010.4 ymin: 4647347 xmax: 499612.3 ymax: 4722135
Projected CRS: WGS 84 / UTM zone 35N
First 10 features:
  TopoID Note TRAP              Source         Type          LU_Around
1 200016 <NA> 9411              Survey Burial Mound              Scrub
2 200018 <NA> 9416 Legacy verification Burial Mound Annual agriculture
              LU_Top       Date DiameterMax HeightMax Condition
1              Scrub 2010-11-07          15       1.5         2
2 Annual agriculture 2010-11-07          20       2.0         3
  PrincipalSourceOfImpact                               AllNotes     distTown
1                  Nodata                                   <NA> 1806.658 [m]
2             Agriculture part of the north  slope was ploughted 1303.851 [m]
  distTownBoundary elevAster slopeAster aspectAster      TRI        TPI
1     1806.658 [m]  242.0523   3.527693   133.30900 1.295364  0.5708199
2     1303.851 [m]  187.9270   3.088841    57.23311 1.184498 -0.2686462
     rough prom250mbuff TypeCertainty       distBG      AKB            Area
1 4.609344         7.17       Certain 61029.50 [m] 10001492        Drazhevo
2 3.848999        43.48       Certain 62182.78 [m] 10001486 Hadzhidimitrovo
  AKBNotes                 geometry
1     <NA> POINT (453727.1 4706737)
2     <NA> POINT (452999.1 4707739)
 [ reached 'max' / getOption("max.print") -- omitted 8 rows ]

Fix column names for legibility

names(Y_features)
 [1] "TopoID"                  "Note"                   
 [3] "TRAP"                    "Source"                 
 [5] "Type"                    "LU_Around"              
 [7] "LU_Top"                  "Date"                   
 [9] "DiameterMax"             "HeightMax"              
[11] "Condition"               "PrincipalSourceOfImpact"
[13] "AllNotes"                "distTown"               
[15] "distTownBoundary"        "elevAster"              
[17] "slopeAster"              "aspectAster"            
[19] "TRI"                     "TPI"                    
[21] "rough"                   "prom250mbuff"           
[23] "TypeCertainty"           "distBG"                 
[25] "AKB"                     "Area"                   
[27] "AKBNotes"                "geometry"               
Y_features <- Y_features %>%
    rename(TopoNote = Note, LanduseAround = LU_Around, LanduseOnTop = LU_Top, DiameterMax_m = DiameterMax,
        HeightMax_m = HeightMax, DistTown_m = distTown, DistTownBoundary_m = distTownBoundary,
        DistBG_m = distBG, AKBArea = Area)

Export Yambol features, later and early

To export these properly, select your desired version to run the script with either the early or the later dataset. Also, mke sure to select the best format for your further processing: rds(R) or geojson(Python). Default is the _early version and rds format.

# Features in Yambol
names(Y_features)
 [1] "TopoID"                  "Note"                   
 [3] "TRAP"                    "Source"                 
 [5] "Type"                    "LU_Around"              
 [7] "LU_Top"                  "Date"                   
 [9] "DiameterMax"             "HeightMax"              
[11] "Condition"               "PrincipalSourceOfImpact"
[13] "AllNotes"                "distTown"               
[15] "distTownBoundary"        "elevAster"              
[17] "slopeAster"              "aspectAster"            
[19] "TRI"                     "TPI"                    
[21] "rough"                   "prom250mbuff"           
[23] "TypeCertainty"           "distBG"                 
[25] "AKB"                     "Area"                   
[27] "Notes"                   "geometry"               
glimpse(Y_features)
Rows: 1,243
Columns: 28
$ TopoID                  <dbl> 200016, 200018, 200019, 200020, 200022, 200023…
$ Note                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ TRAP                    <dbl> 9411, 9416, 9417, 9409, 9314, 9315, 9415, 9418…
$ Source                  <chr> "Survey", "Legacy verification", "Legacy verif…
$ Type                    <chr> "Burial Mound", "Burial Mound", "Burial Mound"…
$ LU_Around               <chr> "Scrub", "Annual agriculture", "Pasture", "Ann…
$ LU_Top                  <chr> "Scrub", "Annual agriculture", "Pasture", "Pas…
$ Date                    <date> 2010-11-07, 2010-11-07, 2010-11-07, 2010-11-0…
$ DiameterMax             <dbl> 15, 20, 15, 30, 30, 20, 25, 50, 15, 20, 10, 23…
$ HeightMax               <dbl> 1.5, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 6.0, 1.0, 2…
$ Condition               <fct> 2, 3, 4, 2, 5, 2, 5, 2, 3, 5, 5, 5, 5, 5, 1, 5…
$ PrincipalSourceOfImpact <chr> "Nodata", "Agriculture", "Looting", "Agricultu…
$ AllNotes                <chr> "", "part of the north  slope was ploughted", …
$ distTown                [m] 1806.65789 [m], 1303.85117 [m], 1276.73451 [m], …
$ distTownBoundary        [m] 1806.65789 [m], 1303.85117 [m], 1276.73451 [m], …
$ elevAster               <dbl> 242.0523, 187.9270, 185.9483, 215.0000, 227.64…
$ slopeAster              <dbl> 3.5276933, 3.0888411, 3.1204007, 0.5485047, 3.…
$ aspectAster             <dbl> 133.3090024, 57.2331062, 53.8955957, 53.731018…
$ TRI                     <dbl> 1.2953644, 1.1844978, 1.0071316, 0.1759815, 1.…
$ TPI                     <dbl> 5.708199e-01, -2.686462e-01, -9.536743e-06, 1.…
$ rough                   <dbl> 4.6093445, 3.8489990, 4.0048981, 0.8009949, 4.…
$ prom250mbuff            <dbl> 7.17, 43.48, 12.04, 11.15, 67.27, 77.14, 17.58…
$ TypeCertainty           <chr> "Certain", "Certain", "Certain", "Certain", "C…
$ distBG                  [m] <units[26 x 1]>
$ AKB                     <dbl> 10001492, 10001486, 10001487, 10001371, 10001488…
$ Area                    <chr> "Drazhevo", "Hadzhidimitrovo", "Hadzhidimitrov…
$ Notes                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "Excavated…
$ geometry                <POINT [m]> POINT (453727.1 4706737), POINT (452999.1 4707…
# Early version
Y_features %>%
    write_rds("../output_data/Y_features_dd_early.rds")

Y_features %>%
    st_write("../output_data/Y_features_dd_early.geojson", append = F)
Deleting layer not supported by driver `GeoJSON'
Deleting layer `Y_features_dd_early' failed
Writing layer `Y_features_dd_early' to data source 
  `../output_data/Y_features_dd_early.geojson' using driver `GeoJSON'
Updating existing layer Y_features_dd_early
Writing 1243 features with 27 fields and geometry type Point.
# Later version Y_features %>%
# write_rds('../output_data/Y_features_dd_later.rds') Y_features %>%
# st_write('../output_data/Y_features_dd_later.geojson', append = F)

Filter deduplicated features for mounds

Now that the attributes look reasonably well, let’s filter out and export the most likely mounds inside the Yambol Province.

Y_features %>%
    filter(grepl("Mound|Other|Uncertain", Type)) %>%
    group_by(Type) %>%
    tally()
Simple feature collection with 4 features and 2 fields
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: 433010.4 ymin: 4647347 xmax: 499612.3 ymax: 4722135
Projected CRS: WGS 84 / UTM zone 35N
# A tibble: 4 × 3
  Type                     n                                            geometry
  <chr>                <int>                                    <MULTIPOINT [m]>
1 Burial Mound           910 ((433010.4 4694365), (433295.2 4693180), (433954.3…
2 Extinct Burial Mound   168 ((433034.9 4693186), (433441.6 4692413), (435565.5…
3 Other                   72 ((445501 4697842), (462171.2 4655463), (462313 468…
4 Uncertain Feature       31 ((445710.4 4699311), (457010.8 4671808), (462076.8…

Export Yambol mounds, later and early

Moving to export, uncomment the relevant section of early or later variant. I recommend using the _later deduplicated variant as authoritative as the later observations are more “current”. But there is good reason to use the _early variant if you wish to look for change in observations between teams and seasons, or if you need to add AKB information - it is linked to the earlier IDs as it’s usually recorded upon first visit.

# early
Y_features %>%
    filter(grepl("[Mm]ound", Type)) %>%
    st_write("../output_data/Y_mounds_dd_early.geojson", append = FALSE)
Deleting layer not supported by driver `GeoJSON'
Deleting layer `Y_mounds_dd_early' failed
Writing layer `Y_mounds_dd_early' to data source 
  `../output_data/Y_mounds_dd_early.geojson' using driver `GeoJSON'
Updating existing layer Y_mounds_dd_early
Writing 1078 features with 27 fields and geometry type Point.
Y_features %>%
    filter(grepl("[Mm]ound", Type)) %>%
    write_rds("../output_data/Y_mounds_dd_early.rds")

# later Y_features %>% filter(grepl('[Mm]ound', Type)) %>%
# st_write('../output_data/Y_mounds_dd_later.geojson', append=FALSE)

# Y_features %>% filter(grepl('[Mm]ound', Type)) %>%
# write_rds('../output_data/Y_mounds_dd_later.rds')